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Abstract 

We consider the problem of estimating the support of a vector /?* g Rp based on observa- 
tions contaminated by noise. A significant body of work has studied behavior of £i-relaxations 
when applied to measurement matrices drawn from standard dense ensembles (e.g., Gaussian, 
Bernoulli). In this paper, we analyze sparsified measurement ensembles, and consider the trade- 
off between measurement sparsity, as measured by the fraction 7 of non-zero entries, and the 
statistical efficiency, as measured by the minimal number of observations n required for exact 
support recovery with probability converging to one. Our main result is to prove that it is 
possible to let 7 ^ at some rate, yielding measurement matrices with a vanishing fraction of 
non-zeros per row while retaining the same statistical efficiency as dense ensembles. A variety 
of simulation results confirm the sharpness of our theoretical predictions. 

Keywords: Quadratic programming; Lasso; subset selection; consistency; thresholds; sparse ap- 
proximation; signal denoising; sparsity recovery; £i-regularization; model selection 

1 Introduction 

Recent years have vi^itnessed a flurry of research on the recovery of high-dimensional sparse sig- 
nals (e.g., compressed sensing [2l |6l [18], graphical model selection [I3l[l3], and sparse approxi- 
mation [18j). In all of these settings, the basic problem is to recover information about a high- 
dimensional signal P* gW, based on a set of n observations. The signal (3* is assumed a priori to 
be sparse: either exactly A;-sparse, or lying within some iq-hall with q < 1. A large body of theory 
has focused on the behavior of various £i-relaxations when applied to measurement matrices drawn 
from the standard Gaussian ensemble [SI [2] , or more general random ensembles satisfiying mutual 
incoherence conditions [131 HO] • 

These standard random ensembles are dense, in that the number of non-zero entries per mea- 
surement vector is of the same order as the ambient signal dimension. Such dense measurement 
matrices are undesirable for practical applications (e.g., sensor networks), in which it would be 
preferable to take measurements based on sparse inner products. Sparse measurement matrices 
require significantly less storage space, and have the potential for reduced algorithmic complexity 
for signal recovery, since many algorithms for linear programming, and conic programming more 
generally [Ij, can be accelerated by exploiting problem structure. With this motivation, a body 
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of past work (e.g. [U [HI [161 [23]), motivated by group testing or coding perspectives, has studied 
compressed sensing methods based on sparse measurement ensembles. However, this body of work 
has focused on the case of noiseless observations. 

In contrast, this paper focuses on observations contaminated by additive noise which, as we 
show, exhibits fundamentally different behavior than the noiseless case. Our interest is not on 
sparse measurement ensembles alone, but rather in understanding the trade-off between the degree 
of measurement sparsity, and its statistical efficiency. We assess measurement sparsity in terms 
of the fraction 7 of non-zero entries in any particular row of the measurement matrix, and we 
define statistical efhciency in terms of the minimal number of measurements n required to recover 
the correct support with probability converging to one. Our interest can be viewed in terms of 
experimental design: more precisely we ask: what degree of measurement sparsity can be permitted 
without any compromise in the statistical efficiency? To bring sharp focus to the issue, we analyze 
this question for exact subset recovery using -constrained quadratic programming, also known 
as the Lasso in the statistics literature O [T7] , where past work on dense Gaussian measurement 
ensembles [20j provides a precise characterization of its success/failure. We characterize the density 
of our measurement ensembles with a positive parameter 7 € (0, 1], corresponding to the fraction 
of non-zero entries per row. We first show that for all fixed 7 G (0,1], the statistical efficiency 
of the Lasso remains the same as with dense measurement matrices. We then prove that it is 
possible to let 7 — > at some rate, as a function of the sample size n, signal length p and signal 
sparsity k, yielding measurement matrices with a vanishing fraction of non-zeroes per row while 
requiring exactly the same number of observations as dense measurement ensembles. In general, 
in contrast to the noiseless setting [23], our theory still requires that the average number of non- 
zeroes per column of the measurement matrix (i.e., 'jn) tend to infinity; however, under the loss 
function considered here (exact signed support recovery), we prove that no method can succeed 
with probability one if this condition does not hold. The remainder of this paper is organized as 
follows. In Section [21 we set up the problem more precisely, state our main result, and discuss some 
of its implications. In Section [3l we provide a high-level outline of the proof. 

Work in this paper was presented in part at the International Symposium on Information Theory 
in Toronto, Canada (July, 2008). We note that in concurrent and complementary work, Wang et 
al. [22j have analyzed the information-theoretic limitations of sparse measurement matrices for 
exact support recovery. 

Notation: Throughout this paper, we use the following standard asymptotic notation: f{n) = 
0{g{n)) if f{n) < Cg{n) for some constant C < +00; f{n) = ^}{g{n)) if f{n) > cg{n) for some 
constant c > 0; and f{n) = Q{g{n)) if f{n) = 0{g{n)) and f{n) = Q{g{n)). 

2 Problem set-up and main result 

We begin by setting up the problem, stating our main result, and discussing some of their conse- 
quences. 
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2.1 Problem formulation 



Let P* £ W he a fixed but unknown vector, with at most k non-zero entries {k < |), and define 
its support set 

S := {iG{l,...,p} I /3*#0}. (1) 

We use Pmin to denote the minimum value of on its support — that is, (3min '■= minies 

Suppose that we make a set {Yi, . . . ,Yn} of n independent and identically distributed (i.i.d.) 
observations of the unknown vector P* , each of the form 

y. := xJ/3* + Wi, (2) 

where W ~ M{0,a'^) is observation noise, and Xi G is a measurement vector. It is convenient 
to use Y = [Yi Y2 ... Yn\ to denote the n- vector of measurements, with similar notation for 
the noise vector W gW^, and 



X 



xi 



Xri 



T 



[Xi X2 ... Xp]. (3) 



to denote the nxp measurement matrix. With this notation, the observation model can be written 
compactly as Y = X(3* + W. 

Given some estimate /3, its error relative to the true fi* can be assessed in various ways, de- 
pending on the underlying application of interest. For applications in compressed sensing, various 
types of iq norms (i.e., E||/3 — are well- motivated, whereas for statistical prediction, it is most 

natural to study a predictive loss (e.g., E||X/3 — For reasons of scientific interpretation or 

for model selection purposes, the object of primary interest is the support S oi j3* . In this paper, 
we consider a slightly stronger notion of model selection: in particular, our goal is to recover the 
signed support of the unknown /?*, as defined by the vector S{f3*) with elements 




if/3*/0 
otherwise. 



Given some estimate /?, we study the probability P[5(/3) = S{P*)] that it correctly specifies the 
signed support. 

The estimator that we analyze is £i-constrained quadratic programming (QP), also known as the 
Lasso |17j in the statistics literature. The Lasso generates an estimate /? by solving the regularized 
QP 

P = argmm|i-||y-X/3||i + p„||/3||i|, (4) 



where > is a user-defined regularization parameter. A large body of past work has focused on 
the behavior of the Lasso for both deterministic and random measurement matrices (e.g., [5tll3 1 [T8 t 
I20j). Most relevant here is the sharp threshold [20] characterizing the success/failure of the Lasso 
when applied to measurement matrices X drawn randomly from the standard Gaussian ensemble 
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(i.e., each element Xij ~ AA(0, 1) i.i.d.). In particular, the Lasso undergoes a sharp threshold as a 
function of the control parameter 



9{n,p, k) 



n 



(5) 



2k\og{p — k) 



For the standard Gaussian ensemble and sequences (n,p, k) such that 0{n,p, k) > 1, the probability 
of Lasso success goes to one, whereas it converges to zero for sequences for which 9{n,p,k) < 1. 
The main contribution of this paper is to show that the same sharp threshold holds for 7-sparsified 
measurement ensembles, including a subset for which 7 — > 0, so that each row of the measurement 
matrix has a vanishing fraction of non-zero entries. 

2.2 Statement of main result 

A measurement matrix X £ M"^^ drawn randomly from a Gaussian ensemble is dense, in that each 
row has Q{p) non-zero entries. The main focus of this paper is the observation model ([2]), using 
measurement ensembles that are designed to be sparse. To formalize the notion of sparsity, we let 
7 G (0, 1] represent a measurement sparsity parameter, corresponding to the (average) fraction of 
non-zero entries per row. Our analysis allows the sparsity parameter 'j{n,p,k) to be a function of 
the triple {n,p,k), but we typically suppress this explicit dependence so as to simplify notation. 
For a given choice of 7, we consider measurement matrices X with i.i.d. entries of the form 



By construction, the expected number of non-zero entries in each row of X is 7p. It is straight- 
forward to verify that for any constant setting of 7, elements Xij from the ensemble ([6]) are sub- 
Gaussian. (A zero-mean random variable Z is sub-Gaussian [TOj if there exists some constant C > 
such that F[\Z\ > t] < 2exp(-Ct2) for all t > 0.) For this reason, one would expect such ensembles 
to obey similar scaling behavior as Gaussian ensembles, although possibly with different constants. 
In fact, the analysis of this paper establishes exactly the same control parameter threshold ([5]) for 
7-sparsified measurement ensembles, for any fixed 7 G (0, 1), as the completely dense case (7 = 1). 
On the other hand, if 7 is allowed to tend to zero, elements of the measurement matrix are no longer 
sub-Gaussian with any fixed constant, since the variance of the Gaussian mixture component scales 
non-trivially. Nonetheless, our analysis shows that for 7—^0 suitably slowly, it is possible to 
achieve the same statistical efficiency as the dense case. 

In particular, we state the following result on conditions under which the Lasso applied to spar- 
sified ensembles has the same sample complexity as when applied to the dense (standard Gaussian) 
ensemble: 

Theorem 1. Suppose that the measurement matrix X G M"^P is drawn with i.i.d. entries according 
to the ^-sparsified distribution ([6]). Then for any e > 0, if the sample size satisfies 



then the Lasso succeeds with probability one as {n,p,k) +00 in recovering the correct signed 




Z ~ AA(0, 1) with probability 7 
with probability 1 — 7. 



(6) 



n > 




(7) 
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support as long as 



oo 



log {p - k) 




loglog(p- fc) \ ^ ^ 

iog(p - fc) y 

^minlk l°g(P ~ ^) ] ^ ^o 



(8b) 



Remarks: 

(a) To provide intuition for Theorem [H it is helpful to consider various special cases of the sparsity 
parameter 7. First, if 7 is a constant fixed to some value in (0, 1], then it plays no role in the 
scaling, and condition (j8c|) is always satisfied. Furthermore, condition ()8aP is then the exact same 
as that of from previous work [20j on dense measurement ensembles (7 = 1). However, condi- 
tion (jSbp is slightly weaker than the corresponding condition from [20j in that Pmin must approach 
zero more slowly. Depending on the exact behavior of Pmin, choosing to decay slightly more 
slowly than logp/n is sufficient to guarantee exact recovery with n = Q{klog{p — k)), meaning 
that we recover exactly the same statistical efficiency as the dense case (7 = 1) for all constant 
measurement sparsities 7 G (0, 1). At least initially, one might think that reducing 7 should in- 
crease the required number of observations, since it effectively reduces the signal-to- noise ratio by 
a factor of 7. However, under high-dimensional scaling (p — > -|-oo), the dominant effect limiting 
the Lasso performance is the number (p — k) of irrelevant factors, as opposed to the signal-to-noise 
ratio (scaling of the minimum) . 



(b) However, Theorem [T] also allows for general scalings of the measurement sparsity 7 along 
with the triplet {n,p,k). More concretely, let us suppose for simplicity that /3mm = ©(I)- Then 
over a range of signal sparsities — say k = ap, k = Q{^yp) 01 k = 0(log(p — k)), corresponding 
respectively to linear sparsity, polynomial sparsity, and exponential sparsity — -we can choose a 
decaying measurement sparsity, for instance 



log log {p — k) 
log (p — k) 



(9) 



along with the regularization parameter 



log (p-k) I log(p-fc) 



while maintaining the same 



log log {p—k) 

sample complexity (required number of observations for support recovery) as the Lasso with dense 
measurement matrices. 



(c) Of course, the conditions of Theorem[T]do not allow the measurement sparsity 7 to approach zero 
arbitrarily quickly. Rather, for any 7 guaranteeing exact recovery, condition (jSap implies that the 
average number of non-zero entries per column of X (namely, 771) must tend to infinity. (Indeed, 
with n = ^{k\og{p — k)), our specific choice ([9]) certainly satisfies this constraint.) A natural 
question is whether exact recovery is possible using measurement matrices, either randomly drawn 
or deterministically designed, with the average number of non-zeros per row (namely 771) remaining 
bounded. In fact, under the criterion of exactly recovering the signed support (jH), no method can 
succeed with w.p. one if 7^/3^^^ remains bounded. 
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Proposition 1. Ifl'iT'Pmin '^'^^^ '^'^^ tend to infinity, then no method can recover the signed support 
with probability one. 

Proof. We construct a sub-problem that must be solvable by any method capable of performing 
exact signed support recovery. Suppose that PI = Pmin / and that the column Xi has ui 
non-zero entries, say without loss of generality indices i = 1, . . . ,ni. Now consider the problem of 
recovering the sign of P^. Let us extract the observations i = 1, . . . ,ni that explicitly involve PI, 
writing 

Yi = XaPl+ Yl X,,p* + W^, i = l,...,ni 

where T(i) denotes the set of indices in row i for which Xij is non-zero, excluding index 1. Even 
assuming that {Pj ,j G T(i)} were perfectly known, this observation model (fTO|l is at best equivalent 
to observing /?| contaminated by constant variance additive Gaussian noise, and our task is to 
distinguish whether PI = Pmin or PI = -Pmin- The average Y = ^ YA=i0^i - ^jeT{i) ^ijf^j] is a 

— 2 

sufficient statistic, following the distribution Y ~ N{Pmin, Unless the effective signal-to- noise 
ratio, which is of the order niP^-^, goes to infinity, there will always be a constant probability of 
error in distinguishing /?* = Pmin from PI = —Pmin- Under the 7-sparsified random ensemble, we 
have ni < (1 + o{l))jn with high probability, so that no method can succeed unless jnP'^^^ goes 
to infinity, as claimed. □ 

Note that the conditions in Theorem [T] imply that n^P^^^ +00. In particular, condition (I8b|) 
implies that = o(/?^j„), and condition ([8a|) implies that njp'^ +00, which implies the condition 
of Proposition [TJ 



3 Proof of Theorem [T] 

This section is devoted to the proof of Theorem[TJ We begin with a high-level outline of the proof; as 
with previous work on dense Gaussian ensembles ^20j, the key is the notion of a primal-dual witness 
for exact signed support recovery. We then proceed with the proof, divided into a sequence of 
separate lemmas. Analysis of "sparsified" matrices require results on spectral properties of random 
matrices not covered by the standard literature. The proofs of some of the more technical results 
are deferred to the appendices. 



3.1 High-level overview of proof 

For the purposes of our proof, it is convenient to consider matrices X G M"^^ with i.i.d. entries of 
the form 

^ fz~7\A(0, i) with probability 7 ^^^^ 

[0 with probability 1 — 7. 

So as to obtain an equivalent observation model, we also reset the variance of Wi of each noise term 
Wi to be — . Finally, we can assume without loss of generality that sign(/?J) = 1 € M*^. 
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Define the sample covariance matrix 

- 1 1 " 

S := -X^X = -S^XixJ. (11) 
n n ^-^ 

i=l 

Of particular importance to our analysis is the kxk sub-matrix Tiss- For future reference, we state 
the following claim, proved in Appendix [Dl 

Lemma 1. Under the conditions of TheoremUl the submatrix S55 is invertible with probability 
greater than 1 — 0{j^^^j^). 

The foundation of our proof is the following lemma: it provides sufficient conditions for the 
Lasso dH to recover the signed support set. 

Lemma 2 (Primal-dual conditions for support recovery). Suppose that ^ss >~ 0, and that we can 
find a primal vector (3 G M^, and a subgradient vector z ^ W that satisfy the zero-subgradient 
condition 

S ^X^W + PnZ = 0, (12) 

and the signed-support-recovery conditions 

Zi = sign(/?*) for all i G S, (13a) 

/3j = for all j G S^, (13b) 

\zj\ < 1 for all j £ S'^, and (13c) 

sign(^i) = sign(/?*) forallieS. (13d) 

Then (3 is the unique optimal solution to the Lasso and recovers the correct signed support. 

See App endix IB . 1 1 for the proof of this claim. 

Thus, given Lemmas [1] and [21 it suffices to show that under the specified scaling of (n,p, k), there 
exists a primal-dual pair (/?,?) satisfying the conditions of Lemma [2l We establish the existence of 
such a pair with the following constructive procedure: 

(a) We begin by setting /^^c = 0, and zs = sign(/3J). 

(b) Next we determine by solving the linear system 

%s(fi*s-h)+^X^W + pnSign{[3l) = 0. (14) 

(c) Finally, we determine zs^ by solving the linear system: 

-Pnzs^ = %s^s{[is-Ps)+\xlM. (15) 
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By construction, this procedure satisfies the zero sub-gradient condition (jl2p . as well as auxiliary 
conditions (I13ap and (Il3bp : it remains to verify conditions (llSch and (Il3d[) . 

In order to complete these final two steps, it is helpful to define the following random variables: 



n 



^-Xj [XsitssT^l] Pn (16a) 



n 



-X^W - prS 

n 



W 

-, (16b) 

n 



(16c) 



where S R'^ is the unit vector with one in position i, and 1 G M'^ is the all-ones vector. 
A little bit of algebra (see Appendix IB. 21 for details) shows that Pn^j — 

+ and that 

Ui = Pi — (3* . Consequently, if we define the events 



8{V) := |max|y/ + y/| < p„| (17a) 

£{U) := |max|C/i| </3„i„|, (17b) 

where the minimum value /?mm was defined previously as the minimum value of |/?*| on its support, 
then in order to establish that the Lasso succeeds in recovering the exact signed support, it suffices 
to show that ^[£{V) n £{U)] 1, 

We decompose the proof of this final claim in the following three lemmas. As in the statement 
of Theorem [H suppose that n > (2 + e)A;log(p — A;), for some fixed e > 0. 

Lemma 3 (Control of V""). Under the conditions of TheoremUl we have 

PfmaxlT^'^l > (l-(5)/9„] ^ 0. (18) 

Lemma 4 (Control of V^). Under the conditions of TheoremUl we have 

P[max|K*| > ^ 0. (19) 

j&S" 

Lemma 5 (Control of U). Under the conditions of TheoremUl we have 



P[(£:(C/))^] = P[max|C/i| >/3^i„] ^ 0. (20) 
3.2 Proof of Lemma H 

We assume throughout that S55 is invertible, an event which occurs with probability 1 — o(l) under 
the stated assumptions (see Lemma [1]). If we define the n-dimensional vector 

h := Xs{%sr% (21) 
then the variable can be written compactly as 

ya n 

— = ^J^ = Y^^t^^r (22) 

Pn 
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Note that each term X^j in this sum is distributed as a mixture variable, taking the value with 
probability 1 — 7, and distributed as A^(0, ^) variable with probability 7. For each £ = 1, . . . ,n, 
define the discrete random variable 

d_ ihi with probability 7 ^^^^ 
1 with probability 1 — 7. 



For each index £ = 1, . . . , n, let Zij ~ A^(0, ^). With these definitions, by construction, we have 

n 



1' 

3 

Pn 



To gain some intuition for the behavior of this sum, note that the variables {Zij,i = 1, . . . ,n} 
are independent of {Hi, I = 1, . . . ,n}. (In particular, each is a function of Xs, whereas Z^j is 
a function of X^j, with j ^ S.) Consequently, we may condition on H without affecting Z, and 

y." Il/flp 

since Z is Gaussian, we have (^1 H) N{0, ^^). Therefore, if we can obtain good control on 
the norm ||-ff||2, then we can use standard Gaussian tail bounds (see Appendix |A]) to control the 
maximum maxjgs'c j pn- The following lemma is proved in Appendix ICl 



Lemma 6. Under condition (jScj) . then for any fixed 6 > 0, we have 



Hi < 



7k{l + 6) 



2 
2 



> l-0(exp(-min{21og(p- /c),— })) 



The primary implication of the above bound is that each V-^ / pn variable is (essentially) no larger 
than a A^(0, ^) variable. We can then use standard techniques for bounding the tails of Gaussian 
variables to obtain good control over the random variable maxj \ V-^\/ pn- In particular, by union 
bound, we have 

n 

P[max > (1 - 5)pn\ < (p-k) H^jZj > (1 - 6)] 

For any 6 > 0, define the event 7(5) := {|| H < ^"'^l^^^ }- Continuing on, we have 

F[umx\v;\>{l-6)pn] < ip-k) f^¥[p^He,Z,>{l-6) \T{S)]+n{'rm]^ 

< (p-k) |2 exp ) + 0(exp(- min (2 log(p - fc), ^))) 

where the last line uses a standard Gaussian tail bound (see Appendix and Lemma O Finally, 
it can be verified that under the condition n > (2 + e)k log {p — k) for some e > 0, and with 5 > 
chosen sufficiently small, we have P[maxjgsc |y^"| > (1 — 5)pn] ^ as claimed. 
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3.3 Proof of Lemma [4] 

Defining the orthogonal projection matrix 11^ := Inxn — ^si^s -^^^^^-^s ' then have 



[max I V"/ 1 > Spn] 



= P[niax \XjU^ (W/n) | > 6p„] 
< {p-k)F \\xlU^{W/n)\ > 6pn 



(24) 



Recall from equation (j23p the representation = H^jZij, where H^j is Bernoulli with pa- 
rameter 7, and Z^j ~ A^(0, ^) is Gaussian. The variable Yle=i ^tj binomial; define the following 
event 



2y/k \ ' 



From the Hoeffding bound (see Lemma[7]), we have P[T'^] < 2exp(— ^). Using this representation 
and conditioning on T, we have 



Xjn^{W/n)\ >6p„ 



< 



< 



1 



\-^HejZejUj{W)e\>5pn \ T 



1=1 



n 



+ 2exp(--), 



where we have assumed without loss of generality that the first n(7 + elements of H are 
non-zero. Since 11^ is an orthogonal projection matrix, we have ||n5r(M^)||2 < ||VF||2, so that 



Xjli^{W/n)\ >6pn 



variance 



+ 2exp(-- 



(25) 



I- V Zf,jW,\>5p 

?i(7H — 1^) 

Conditioned on the random variable Mj := - X]£=i Z^jWi is zero-mean Gaussian with 



r2 
■I ■ 



e=i 



For some 6i > 0, define the event 



Note that E[u{W;-/)] = + ^). Since ^ Y.i=i is with d = + ^) degrees 

of freedom, using x^-tail bounds (see Appendix we have 

P[(T2(<5i))1 < exp(-n(7 + ^)^). 
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Now, by conditioning on 72(^1) and its complement and using tail bounds on Gaussian variates 
(see Appendix [A]) , we obtain 



< 



< 2 exp 



1=1 



+n{T2{6i)r 



+ 



2.^(1 + 5i)(7 + ^), 

1 ,352 



exp 



-n(7 



(26) 



Finally, putting together the pieces from equations ([26 
that P[maxjg5c \V^\ > 5pn] is upper bounded by 



2Vk' 16 

25]) . and equation (j24p . we obtain 



+ exp(-n(7 + ^)- 



The first term goes to zero since n > (2 + e)fclog(p — A;). The second term goes to zero because 
eventually _ ^ 1 > ^ (because Condition ([8c|) implies that 7\/^ — > 00), and Conditon (iSaj) implies 



7+ 



that Cii'yp^ — log {p — k) ^ 00. Our choice of n and Condition (j8c|) (which implies that 7/c ^ 00) 
is enough for the third term goes to zero. 

3.4 Proof of Lemma [5] 

We first observe that conditioned on Xs, each Ui is Gaussian with mean and variance: 

rrii := E[Ui \ Xs] = ef {^X^ Xs)~\- Pnl\, 
i^i := vai[Ui \ Xs] = ^ef{^X^Xs)'^ei 

Define the upper bounds 



7n n 



m 




a 

7n 



l-0(- /max [ ''^^ loglog(p- k) 

7y \ log (p — fc) ' log(p — /c) ' 



and the following event 



{max|mj| < m* and maxl^i/'j] < V'*}- 



Conditioning on T and its complement, we have 



< 



[-^—maxUil > 1] 

[-^max|?7i| > 1 I T{m\i)*)]+¥[{T{m*,ij*)Y 

Pmin 
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Applying Lemma [TOl with t = 1 and 9 = k, we have F[(T{'m* ,tp*)y] < kO{k~'^). 

We now deal with the first term. Letting Yi ~ A^(0, ijji)^ and using T as shorthand for the event 
T{m* jip*), we have 



^3 



^max|C/i|>l I T] = E jp[max > | Xs,T]\ 

min iaS J 

< E|p[rnax(|m,| + |yi|) >/3^i„ | Xs,T]^ 

< E |p [m* + max \Yi\ > pmin \ Xs, T] | 

= E P[-— max|y.|>l--— I Xs,T]\ 
Condition ()8bp implies that > 0, so that it suffices to upper bound 

Ejp[-^max|y,|>i I Xs,r]| < E jfc p[|y*| > ^ | Xs,r]| 

■ 



< 2A; exp 



8V'* 



where ^ M{0,tp*), and we have used standard Gaussian tail bounds (see Appendix 

It remains to verify that this final term converges to zero. Taking logarithms and ignoring 
constant terms, we have 



( 



32 (i (nrl I f log (fc) logl og(p-fc) ] A \ 

PmmT'MJ^ ^\-yy ^^^'^^\k log (p-k)' log(p-fc) j)j 

8cj2 log k 

V / 



We would like to show that this quantity diverges to — cxd. Condition ([8c|) implies that 



- /max [ ''^^ loglog(p-fc) ^ ^ 

7y \ log (p — A;) ' log(p — A;) 



Hence, it suffices to show that log k [1 — ^g™j"J^fc ^ diverges to —00. We have 
log(A;) ( 1 - j3lilL^\ = log{k) (1 ^ 



161og(/c)y 16cr2 log(A;) ' 



log(A;) (1 - ^""^^ ~ ^) ■ 

16cr^ log(p — A;) log (A;) 

-^^3 r u>j aiiu wuiiuition ([8a|) states that ^^^^^^^^^ 



Condition (|8bp implies that — > 00 and Condition ([8a|) states that — > 00. In our 



observation model, A; < |, and so the third term is greater than one. 
Therefore, we have that P[iS(C/)^] tends to zero. 
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4 Experimental Results 



In this section, we provide some experimental results to illustrate the claims of Theorem [TJ We 
consider two different sparsity regimes, namely linear sparsity {k = ap) and polynomial sparsity 
{k = ^/p), and we allow 7 to converge to zero at some rate. 

For all experiments, the additive noise variance is set to cr^ = 0.0625 and we fix the vector /3* 
by setting the first k entries are set to one, and the remaining entries to zero. There is no loss of 
generality in fixing the support in this way, since the ensemble in invariant under permutations. 

Based on Lemma[2l it suffices to simulate the random variables {Vj^, Vj,j E S'^} and {Ui, i e 5}, 
and then check the equivalent conditions (|17ap and (|17b[) . In all cases, we plot the success proba- 
bility P[5(/3) = S{(3*)\ versus the control parameter 9(n,p,k) = 2k\og{p-k) ■ Note that Theorem [T] 
predicts that the Lasso should transition from failure to success for ^ ~ 1. 

In Figure [H the empirical success rate of the Lasso is plotted against the control parame- 
ter 6{n,p,k) = 2kiog(p-k) • Each panel shows three curves, corresponding to the problem sizes 
p G {512,1024,2048}, and each point on the curve represents the average of 100 trials. For the 
experiments in Figure [U we set 7 = 0.5^^^J^, which converges to zero at a rate slightly faster 
than that guaranteed by Theorem [TJ Nonetheless, we still observe the "stacking" behavior around 
the predicted threshold 9* = 1. 




Figure 1. Plots of the success probability ¥[S — S] versus the control parameter 9{n,p,k) = 
k iog(p-fc) 7-sparsified ensembles, with decaying measurement sparsity 7 — (a) Poly- 

nomial signal sparsity k = 0{y^). (b) Linear signal sparsity k = Q{p). 



5 Discussion 

In this paper, we have studied the problem of recovery the support set of a sparse vector /3* based 
on noisy observations. The main result is to show that it is possible to "sparsify" standard dense 
measurement matrices, so that they have a vanishing fraction of non-zeroes per row, while retaining 
the same sample complexity (number of observations n) required for exact recovery. We also showed 
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that under the support recovery metric and in the presence of noise, no method can succeed without 
the number of non-zeroes per column tending to infinity. See also the paper [22] for complementary 
results on the information-theoretic scaling of sparse measurement ensembles. 

The approach taken in this paper is to find rates which 7 (as a function of n, p, k) can safely tend 
towards zero while maintaining the same statistical efficiency as dense random matrices. In various 
practical settings [21], it may be preferable to make the measurement ensembles even sparser at 
the cost of taking more measurements n and thus decreasing efficiency relative to dense random 
matrices. A natural question is the sample complexity n{'y,p, k) in this regime as well. Finally, this 
work has focused only on a randomly sparsified matrices, as opposed to particular sparse designs 
(e.g., based on LDPC or expander-type constructions [Tj HSl [23] ) . Although our results imply that 
exact support recovery with noisy observations is impossible with bounded degree designs, it would 
be interesting to examine the trade-off between other loss functions (e.g, £2 reconstruction error) 
and sparse measurement designs. 
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A Standard concentration results 

In this appendix, we collect some tail bounds used repeatedly throughout this paper. 

Lemma 7 (Hoeffding bound [9j). Given a binomial variate Z ~ Bin(n, 7), we have for any S > 



F[\Z - -fn\ > 6n] < 2 exp ( - 2n6'^) . 
Lemma 8 (x^-concentration [lOj). Let X ~ Xm ^ chi-squared variate with m degrees of freedom. 



We will also find the following standard Gaussian tail bound [TT] useful: 

Lemma 9 (Gaussian tail behavior). Let V ^ N{0,a'^) be a zero-mean Gaussian with variance cj^. 
Then for all 6 > 0, we have 



B Convex optimality conditions 
B.l Proof of Lemma [2] 

Let /(/?) := 27^11^ ~ ^/^lli + denote the objective function of the Lasso dH). By standard 

convex optimality conditions [15], a vector f3 € MP is a solution to the Lasso if and only if G 
is an element of the subdifferential of /(/3) at (3. These conditions lead to 



Then for all ^ > 5 >0, we have 




F[\V\ > 6] 



< 2 exp ( 



2a2 



)■ 



-X^{Xp-Y)+pnZ = 
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where the dual vector z G is an element of the sub differential of the ^i-norm, given by 
9||^||i = |z G RP I = sign(^i) if /3i / 0, G [-1, 1] otherwise} . 

Now suppose that we are given a pair {(3,z) G x that satisfy the assumptions of 
Lemma [2j Condition ()12p is equivalent to z) satisfying the zero subgradient condition. Condi- 
tions (fT3a|) . (fT3cj) and (fTMl) ensure that z is an element of the subdifferential of the £i-norm at (3. 
Finally, conditions (jl3b[) and ( 13dl) ensure that /3 correctly specifies the signed support. 

It remains to verify that /? is the unique optimal solution. By Lagrangian duality, the Lasso 
problem ^ (given in penalized form) can be written as an equivalent constrained optimization 
problem over the ball \\f3\\i < C{pn), for some constant C{pn) < +oo. Equivalently, we can 
express this single £i-constraint as a set of 2^ linear constraints if^ (3 < C, one for each sign vector 
V G {— The vector z can be written as a convex combination z = Y2v^if^' where the 
weights a*j are non-negative and sum to one. By construction of /? and z, the weights a* form an 
optimal Lagrange multiplier vector for the problem. Consequently, any other optimal solution — say 
f3 — must also minimize the associated Lagrangian 



ma*] 



/(/3) + ^at[t;'^/3-C], 



and satisfy the complementary slackness conditions at yv^P — Cj = 0. Note that these comple- 
mentary slackness conditions imply that z^P = C. But this can only happen if Pj = for all indices 
where \zj\ < 1. Therefore, any optimal solution /3 satisfies Ps'= = 0- Finally, given that all optimal 
solutions satisfy Ps" = 0, we may consider the restricted optimization problem subject to this set 
of constraints. If the Hessian submatrix 'Sss is strictly positive definite, then this sub-problem is 
strictly convex, so that P must be the unique optimal solution, as claimed. 



B . 2 Derivation of { , \// , Ui } 

In this appendix, we derive the form of the {VJ^, Vj'} and {Ui} variables defined in equations (I16ap 
through (jl6cp . We begin by writing the zero sub-gradient condition in a block- form, and substi- 
tuting the relations specified in conditions (jl3ap and (|13b|) : 



^SS ^SS'' 



f3s-P*s 




+ 



n 



+ Pn 



sign (/3s) 

ZSc 



0. 



By solving the top block, we obtain 

U:=Ps- P*s 



^ss^-^sW + p 



„sign(/3^)| 



By back-substituting this relation into the lower block, we can solve explicitly for zs^; doing so 
yields that Sgc = + V^, where the {p — A;)-vectors are defined in equations (|16ap and ()16bp . 
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C Proof of Lemma [6] 

Let Z E M"^" denote a n x n matrix, for which the off-diagonal elements Zij = for all i ^ j, and 

the diagonal elements Za ~ Ber(7) are i.i.d. With this notation, we can write H = Zh. Using the 
definition (1211) of /i, we have 



H\\l 



\\zh\\l 



n 



l'{%s)-\Z^)^{Z^)[tssr'l 
n n 



2fr 
n 



I i=l J 



T{Z) 



where Xi is the i row of the matrix Xs- From Lemma [10] with = \ and t = {p — k), we have 

1 



< 



{P - kf 



(27) 



where/i(p,A:,7) := l-0(i 



max 



{l log log(p-fc) \\ 
\fc' log(p-fc) / ^ 



Next we control the spectral norm of the random matrix r(Z), conditioned on the total number 
^"^^ Zii of non-zero entries. In particular, applying Lemma [10] with t = p — k, and 9 = 1, we have 



rz\\2 > — [1 + 



C 



1 logf log(p- /c) 



' max 



f-log{p-k) 



i=l 



< 



{p - k) 



r, (28) 



as long as k- — > oo. 



The next step is to deal with the conditioning. Define the event 



f 1 1 " 1 " 



Defining the function 

f2{p,k,-f) := ^ 
we have 



1 + 



2Vk'y 



l + O 



7 



max 



1 log(7 + ^)log(p- A;) 
^(^-iTfc)' - :6tnog{p - k) 



\v{z)\\2> h{p.Ki)\ < 



2>/2(p,A:,7) I T{k,^)]+F[{T{k,^yr] 



n 



< exp(-21og(p- /c)) -h2exp(- — 



n 



< 3exp(-min{21og(ji- A;),— }), 



(29) 
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where we have used the bound (j28p . and the Hoeffding bound (see Lemma [7]). 

Combining the bounds ()27p and (I29p . we conclude that as long as 7A; ^ oo, then: 

\\^-'riZ)^-%>ffh] < 4expi-mm{2log{p-k),^}). 
bmce 

IIIII2 = Vk, 

we have 

n\\Hf2>—fff2] < 4exp(-min{21og(p-A:),;^}). 

n Ik 

To conclude the proof, we note that assumption ([8c|) implies that both fi{p,k,^) and /2(p, ^,7) 
converge to 1 as {p, k, 7) scale. In particular, for any fixed 5 > 0, we have /2 < (1 + (5) for {p, k) 
sufficiently large, so that Lemma [6] follows. 



D Singular values of sparsified matrices 

Let 9{p, k) G (0, 1] and t{p, k) E {1, 2, 3, . . .} be functions. Let X be an x A; random matrix with 
i.i.d. entries Xij distributed according to the 7-sparsified ensemble ([6]). 

Lemma 10. Suppose that n > (2 + z^)A;log [p — k) for some > 0. If as k,p — k,^ 00 



rrf I. n 1 / f logW \og[e\og{p - k)] 

T{^,k,p,0,t) := — Wmax ' 



7 y \9k log (j) — k)^ 9 log(p — k) 
then for some constant C G (0, 00), we have 



sup \-^\\Xu\\2-l\>CT{-f,k,p,9,t) 
\u\\2=i \9n 



< 0{^), (30) 



Note that Lemma [10] with 9 = 1 and t = p — k implies that S = ^XjXs is invertible with 
probability greater than 1 — 0{ ^p\y2 ), there establishing Lemma [H Other settings in which this 
lemma is applied are {9, t) = (7,^ — k) and {9, t) = (1, k). The remainder of this section is devoted 
to the proof of Lemma [TOl 

D.l Bounds on expected values 

Let X € M^"^^ be a random matrix with i.i.d. entries, of the sparsified Gaussian form 

Xij ~ (i_^)5^(0)+7Ar(0,l). 

Note that E[Xjj] = and var(Xjj) = 1 by construction. 

We follow the proof technique outlined in [19]. We first note the tail bound: 

Lemma 11. Let Yi, . . . ,Yd be i.i.d. samples of the 'j -sparsified ensemble. Given any vector a S 
and t > 0, we have IP[X]iLi ^i^i > t] ^ exp 2]jajp 



17 



To establish this bound, note that each Yi is dominated (stochastically) by the random variable 
Z ~ N{0, ^). In particular, we have 

My^(A) = E[exp(Ay,)] = (1 - 7) + 7lE[exp(AZ)] < exp(AV27)- 

Now let us bound the maximum singular value Sk{X) of the random matrix X. Letting S'^~^ 
denote the £2 unit ball in d dimensions, we begin with the variational representation 

Sk(X) = max ||Xn|| 

= max max Xu. 

For an arbitrary e G (0, 1), we can find e-covers (in ^2 norm) of S^^~^ and S^^"^ with MQn{() = (3/e)^" 
and Mfc(e) = (3/e)'^ points respectively [12]. Denote these covers by Ce)„(e) and Cfc(e) respectively. 
A standard argument shows that for all e G (0, 1), we have 

„ „ 1 J. 

X 2 < 7 max max VnXua- 

(1 - eY «<,eCfc{e) vpdCe^ie) 

Let us analyze the maximum on the RHS: for a fixed pair (n, v) in our covers, we have 

en k 

u^Xv = XijUjVj. 
i=i j=i 

Let us apply Lemma [TT] with d = 9nk, and weights aij = UiVj. Note that we have 

ii«iii= = E4 = = 1 

since each u and are unit norm. Consequently, for any fixed u, v in the covers, we have 

F[u^Xv >t] < exp (-^ 



By the union bound, we have 



max max vJXua > t 



< Mfc(e)Me„(e)exp (^-^ 

< exp ( (A; + 6'n)log(3/e) - 



It' 



By choosing e = ^ and t = y ^(^ + ^"') log 6, we can conclude that 



si{x)/Ve^ = \\x\\2/Ve^ < c^^J7+ 



k 
9n 



w.p. 1 — exp(— (A; + 0n) log 6). Note that 



en *^ ((2 + z^)01og(p-A;),' 
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since °giogfp-fc) ~^ '^hich implies that 6log {p — k) ^ oo. 
Consequently, we can conclude that 

\\x\\2/Ve^ < o{i/^) 

w.p. one as 9n,k oo. Although this bound is essentially correct for a Af{0, ^) ensemble with 7 
fixed, it is very crude for the sparsified case with 7 — > 0, but will useful in obtaining tighter control 
on si{X) and Sk{X) in the sequel. 



D.2 Tightening the bound 



For a given u G S*^ ^, consider the random variable ||Xn||2 := Ylt=i(-^'^)'i ■ ^^^^ claim that 
each variate Zi = {Xu)f is subexponential: 

Lemma 12. For any t > 0, we have F[Zi > t] < 2exp {-y)- 

Proof. We can write {Xu)i = "^^=1 Xijuj where ||tt||2 = 1- Hence, from Lemma [TT| we have 



X-ijUj > S] < exp(- 



7^2 



By symmetry, we have P[Zj > t] = P[| '^j=i XijUj\ > -v/t] < 2exp(— ^) as claimed. 
Now consider the event 



□ 



\Xug 



in 



> 6 



en 



en 



i=l 



i=l 



> S9n 



We may apply Theorem 1.4 of Vershynin [T9] with b = 86n/^'^ and d = 2/7. Hence, we have 
4b/d = 160n/7, which grows at least linearly in 9n. Hence, for any 6 > less than 166n/j (we will 
in fact take 5 — > 0), we have 



\Xu\\l 



9n 



> 6 



< 2 exp 



6^{enf 
' 2566*71/72 



2 exp 



Y 6^en 

256 



Now take an e-cover of the A:-dimensional £2 ball, say with A^(e) = (3/e)'^ elements. By union 
bound, we have 



inf ^<l_, 

i=l,...,N{e) dn 



< exp 



256 



A:log(3/e) 



Now set 



V2 /256/(A;,p)A:log(3/e) 



7 



0n 



where f{k,p) > 1 is a function to be specified. Doing so yields that the infimum is bounded by 
1 + 6 with probability 1 — exp(— A;/(A;,p) log(3/e)). (Note that the choice of f{k,p) influences the 
rate of convergence, hence its utility.) 
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For any element u & ^, we have some Ui in the cover, and moreover 

\\\Xuf -\\Xu.if\ = \{\\Xu\\-\\Xui\\} {\\Xu\\ + \\Xui\\}\ 

< \{\\Xu\\ - \\Xu,\\}\ {2\\X\\) 

< {\\X\\ \\u-u^\\) {2\\X\\) < 2\\Xfe 

From our earher result, we know that = 0{9n/^) with probability 1 — exp(log6(A; + On)). 

Putting together the pieces, we have that the bound 



1 • f 11^ l|2 ^ ^^,^r I / 32/(A;,ff)fclog(3/e) , C2 
— ml \\Xu\\ > 1 + + 626/7 = 1 + -^/ 7; \ e, 

un weS'-'-i 7 V On 7 

for some constant C2 > independent of 9n,k,'j, holds with probability at least 

min{l — exp{—kf{k,p) log(3/e)), 1 — exp(— log6(fc + On))}, (31) 
Now set e = 3k /On, so that we have w.h.p. 



— inf llXnr > 1-— W/(A;,p)^log(— ) 
On ueSf"'^ 7 V On k 



(Note that we have utilized the fact that both y f{k,p)^ log(^) and ^ — ^ 0, but the former more 
slowly than the latter.) 

Since k/On 0, this quantity will go to zero, as long as f{k,p) remains fixed, or scales slowly 
enough. To understand how to choose f{k,p), let us consider the rate of convergence (I3ip . To 
establish the claim ()30p . we need rates fast enough to dominate a log(t) term in the exponent, 
which guides our choice of f{k,p). Recall that we are seeking to prove a scaling of the form 
n = 6(A:log(p — k)), so that our requirement (with e = 3k/0n = giog(^p„fc) ) is equivalent to the 
quantity 



^ ^ ioge°oi(p-fc) ' then we may set 



kf{k,p)\ogi3/e)-\og{t) = kf{k,p)log[Olog{p-k)]-logit) 

log(t) 

iog[eiog{p-fc)p 



tending to infinity. First, if > -, — r „l°^y_, ^ i , then we may simply set f{k,p) = 2. Otherwise, if 



If f{k,p) = 2, then we have 



k log iog[p — k) 



In the other case, if A; < 1 — ir^^r — rr, we have 

' — log log(p— fc) ' 

/(fc,p)_log( — ) < 2 ,^^_^^_, logglog(p-fc) - 



On k klogOlog{p — k) Olog{p — k) kOlog{p—k) 
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which again follows from the assumptions in Lemma [TOl 

Recalhng the definition of T(7, k,p, 9, t) from Lemma \T0\ we can summarize both cases can be 
summarized cleanly by saying that with probability greater than 1 — 4-: 



— inf WXuf > 1-^ Z^^^^-H logt log01og(p-/c) 



Onues''-'^" ~ \k6log{p — k)' 6log{p — k) 

= l-CT{j,k,p,6,t) 

Because T(7, k,p, 9, t) — > 0, for all p > Pi, k > k^, CT{'y, k,p, 9,t) < 1. Thus we can take square 
root of both sides and apply the identity \/l + x = 1 + 1 +o{x) (valid for \x\ < 1) to conclude that, 
with probability greater than 1 — ^iliEiiMl^ 

^ inf ||Xn|| > l-^T{-/,k,p,e,t)+o{T{j,k,p,0,t)), 



As T{'y,k,p,9,t) 0, for all k>k^,p>pl we have that \o(T{'y,k,p,9,t))\ < ^T{'y,k,p,9,t) 
Thus, with probability greater than 1 — '^^(^I'^i 'P2'^2) . 

1 sr* 

inf ||Xn|| > I - —T{-f,k,p,9,t), 



V0nues>'-'' 4 

Note that this same process can be repeated to bound the maximum singular value, yielding 
the following result: 

1 3C 

sup \\Xu\\ < 1 + —T{-y,k,p,9,t), 



Combining these two bounds, we have proved Lemma [TOl 



References 

[1] S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge, 
UK, 2004. 

[2] E. Candes and T. Tao. Decoding by linear programming. IEEE Trans. Info Theory, 
51(12):4203-4215, December 2005. 

[3] S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit. SIAM 
J. Sci. Computing, 20(1):33-61, 1998. 

[4] G. Commode and S. Muthukrishnan. Towards an algorithmic theory of compressed sensing. 
Technical report, Rutgers University, July 2005. 

[5] D. Donoho. For most large underdetermined systems of linear equations, the minimal £i-norm 
near-solution approximates the sparsest near-solution. Communications on Pure and Applied 
Mathematics, 59(7):907-934, July 2006. 



21 



[6] D. Donoho. For most large undcrdctermined systems of linear equations, the minimal £i-norm 
solution is also the sparsest solution. Communications on Pure and Applied Mathematics, 
59(6):797-829, June 2006. 

[7] J. Feldman, T. Malkin, R. A. Servedio, C. Stein, and M. J. Wainwright. LP decoding corrects 
a constant fraction of errors. IEEE Trans. Information Theory, 53(l):82-89, January 2007. 

[8] A. Gilbert, M. Strauss, J. Tropp, and R. Vershynin. Algorithmic linear dimension reduction 
in the £i-norm for sparse vectors. In Proa. Allerton Conference on Communication, Control 
and Computing, Allerton, IL, September 2006. 

[9] W. Hoeffding. Probability inequalities for sums of bounded random variables. Journal of the 
American Statistical Association, 58:13-30, 1963. 

[10] I. Johnstone. Chi-square oracle inequalities. In M. de Gunst, G. Klaassen, and A. van der 
Vaart, editors. State of the Art in Probability and Statistics, number 37 in IMS Lecture Notes, 
pages 399-418. Institute of Mathematical Statistics, 2001. 

[11] M. Ledoux and M. Talagrand. Probability in Banach Spaces: Isoperimetry and Processes. 
Springer- Verlag, New York, NY, 1991. 

[12] J. Matousek. Lectures on discrete geometry. Springer- Verlag, New York, 2002. 

[13] N. Meinshausen and P. Buhlmann. High-dimensional graphs and variable selection with the 
lasso. Annals of Statistics, 2006. To appear. 

[14] P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional graph selection using i\- 

regularized logistic regression. Technical Report 750, UC Berkeley, Department of Statistics, 
April 2008. Posted at http://arXiv.org/abs/0804.4202; Conference version appeared at NIPS 
Conference, December 2006. 

[15] G. Rockafellar. Convex Analysis. Princeton University Press, Princeton, 1970. 

[16] S. Sarvotham, D. Baron, and R. G. Baraniuk. Sudocodes: Fast measurement and reconstruc- 
tion of sparse signals. In Int. Symposium on Information Theory, Seattle, WA, July 2006. 

[17] R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical 
Society, Series B, 58(l):267-288, 1996. 

[18] J. Tropp. Just relax: Convex programming methods for identifying sparse signals in noise. 
IEEE Trans. Info Theory, 52(3):1030-1051, March 2006. 

[19] R. Vershynin. On large random almost euclidean bases. Acta. Math. Univ. Comenianae, 
LXIX: 137-144, 2000. 

[20] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy recovery of sparsity using 
using ^i-constrained quadratic programs. Technical Report 709, Department of Statistics, UC 
Berkeley, 2006. 

[21] M. B. Wakin, J. N. Laska, M. F. Duarte, D. Baron, S. Sarvotham, D. Takhar, K. F. Kelly, 
and R. G. Baraniuk. An architecture for compressive imaging. IEEE Int. Conf. Image Proc, 
pages 1273-1276, 8-11 Oct. 2006. 



22 



[22] W. Wang, M. J. Wainwright, and K. Ramchandran. Information-theoretic limits on sparse sup- 
port recovery: Dense versus sparse measurements. Technical report, Department of Statistics, 
UC Berkeley, April 2008. Short version presented at Int. Symp. Info. Theory, July 2008. 

[23] W. Xu and B. Hassibi. Efficient compressive sensing with deterministic guarantees using 
expander graphs. Information Theory Workshop, 2007. ITW '07. IEEE, pages 414-419, 2-6 
Sept. 2007. 



23 



